Negative Example Selection for Protein Function Prediction: The NoGO Database

نویسندگان

  • Noah Youngs
  • Duncan Penfold-Brown
  • Richard Bonneau
  • Dennis Shasha
چکیده

Negative examples - genes that are known not to carry out a given protein function - are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In Silico Prediction and Docking of Tertiary Structure of Multifunctional Protein X of Hepatitis B Virus

Hepatitis B virus (HBV) infection is a universal health problem and may result into acute, fulminant, chronic hepatitis liver cirrhosis, or hepatocellular carcinoma. Sequence for protein X of HBV was retrieved from Uniprot database. ProtParam from ExPAsy server was used to investigate the physicochemical properties of the protein. Homology modeling was carried out using Phyre2 server, and refin...

متن کامل

Simple and Constrained Selection Indices with and without Calving Interval Included in Selection Goal Function for Holstein Cows of Iran

Two selection goals of inclusion or exclusion of calving interval (CI) in the selection goal function for Holstein cows in Iran, besides milk yield, milk fat percentage, and milk protein percentage, were studied. Four selection indices were composed of using the information on production traits, CI and / or days from calving to first insemination (DFI). The results of the predicted genetic grow...

متن کامل

The Sphingolipid Receptor S1PR2 Is a Receptor for Nogo-A Repressing Synaptic Plasticity

Nogo-A is a membrane protein of the central nervous system (CNS) restricting neurite growth and synaptic plasticity via two extracellular domains: Nogo-66 and Nogo-A-Δ20. Receptors transducing Nogo-A-Δ20 signaling remained elusive so far. Here we identify the G protein-coupled receptor (GPCR) sphingosine 1-phosphate receptor 2 (S1PR2) as a Nogo-A-Δ20-specific receptor. Nogo-A-Δ20 binds S1PR2 on...

متن کامل

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

Prediction of defense mechanism styles based on executive functions

Combining psychoanalysis and neurology feels peculiar at first. The combination that tries to bind concepts of psychoanalysis with neuroscience in order to provide integrates knowledge for better understanding of human mind. However, interaction between psychoanalysis and neuroscience has came to attention during past three decades. Purpose of this study is to investigate and find correlation b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2014